Methods for Extracting Meta-Information from bibliographic databases

نویسنده

  • Maria Biryukov
چکیده

Due to intensive growth of the electronically available publications in the last few decades, bibliographic databases have become widespread. They cover a large variety of knowledge fields and provide a fast access to the wide variety of data. At the same time they contain a wealth of hidden knowledge that requires steps of extra processing in order to infer it. In this work we focus on extraction of such implicit (or meta) knowledge from the research bibliographic databases by looking at them from sociolinguistic, text mining and bibliometric perspectives. We choose the Digital Library and Bibliographic Database — DBLP as a testbed for our experiments. In the framework of the sociolinguistic analysis we build a statistical system for the language identification of personal names. We show also that extension of a purely statistical model with the co-authors network boosts the system’s performance. There are several premises motivating our work. For example, it has been shown that the geographical proximity influences research. Moreover, research is constantly evaluated on the national and international basis. To make these and similar investigations less laborious in terms of human effort, ability to automatically assign personal names to the appropriate language seems to be useful. In the text mining scenario, we perform a number of experiments that focus on topic identification and ranking. While our topic detection approach remains generic and can be used for any kind of textual data, the topic ranking metrics are built upon the information provided by the bibliographic databases. With respect to the topic ranking, our study aims at finding the ways of different topic ordering depending on the question that has

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Bibliographic Databases in Retrieving Information on Telemedicine

Background & Aims: Some of the main questions which can be of importance for those researchers who intend to perform a systematic review in a field of science are: ‘What databases should I use for my review?’; ‘Do all these databases have the same value?’; and ‘Which sourcesretrieved the highest of relevant references?’. The main aim of this work was the identification of the best database for ...

متن کامل

بررسی آلودگی اطلاعات در پایگاه های اطلاعاتی پزشکی منتخب دانشگاه علوم پزشکی مشهد از دیدگاه اعضای هیئت علمی

Introduction: Taking into consideration the increasing usage of scientific databases in science production, university faculty's increasing engagement in the process and concerns about the potentially dubious nature of information sources, specially electronic ones, the topic information pollution phenomenon the, in Mashad university of medical science databases has been chosen as a proposition...

متن کامل

الگوی ملزومات کارکردی پیشینه‌های کتابشناختی: شیوه‌ای نوین در تنظیم عناصر کتابشناختی

Functional Requirements for Bibliographic Records (FRBR) is a conceptual model for the arrangement of bibliographic records in catalogs and databases which was proposed in IFLA 1997, following a plan for revising Anglo-American Cataloging Rules (AACR). This model is inclined to be separated from the other cataloging rules, and uses a new structure for storing and displaying bibliographic record...

متن کامل

Bilingual PRESRI - Integration of Multiple Research Paper Databases

Collecting all the papers in a research field is a first step towards an exhaustive survey. A number of research paper databases are available for searching papers. However, searchers are compelled to repeat the same search operation for each database if there are multiple databases for a research field. To improve such inefficient searching, we have developed PRESRI, which can construct an exh...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010